Speed up cluster merge by batch copying arrays #280

bouk · 2023-06-15T12:38:18Z

I was looking through the code and found that a lot of time was being spent on 'make clusters' so I did a quick look through why that is the case. I found the merge_clusters function to be inefficient, so I optimized it to do batch memcopies wherever possible.

This gives me a ~20% reduction in time spent for a large image.

Before

After

christian-rauch

Thanks for this impressive speedup. I guess the improvement comes from changing the loop over source with element-wise calls to zarray_add, to the single memcpy?

I think the commit could use an explanation of what is going on and where the speed improvements come from.

christian-rauch · 2023-06-15T20:05:24Z

common/zarray.h

@@ -437,26 +437,36 @@ static inline int zarray_index_of(const zarray_t *za, const void *p)
    return -1;
 }

+/**
+ * Add elements from start up to and excluding end from 'source' into 'dest'.
+ * el_size must be the same for both lists


Should "el_size" here be el_sz instead?

christian-rauch · 2023-06-15T20:07:42Z

common/zarray.h

 static inline void zarray_add_all(zarray_t * dest, const zarray_t * source)
 {
-    assert(dest->el_sz == source->el_sz);
-
-    // Don't allocate on stack because el_sz could be larger than ~8 MB
-    // stack size
-    char *tmp = (char*)calloc(1, dest->el_sz);
-
-    for (int i = 0; i < zarray_size(source); i++) {
-        zarray_get(source, i, tmp);
-        zarray_add(dest, tmp);
-   }
-
-    free(tmp);
+    zarray_add_range(dest, source, 0, source->size);
 }


Does it make sense to keep the function zarray_add_all if it only maps to zarray_add_range? We could also just search & replace all calls to zarray_add_all instead.

merge_clusters was using zarray_add_all which was copying over elements one-by-one doing 2 memcpys and a potential array resize per element. Here we replace it by a range copy that does a single resize and memcpy for the operation which is a lot faster. In my testing it reduces the total runtime for an image that's 2000x3000 by 20%.

bouk · 2023-06-16T08:10:21Z

@christian-rauch addressed your comments!

christian-rauch requested changes Jun 15, 2023

View reviewed changes

bouk force-pushed the bouk/faster-cluster-merge branch from 0ec3dbb to e1b143c Compare June 16, 2023 08:10

bouk requested a review from christian-rauch June 16, 2023 11:15

christian-rauch approved these changes Jun 16, 2023

View reviewed changes

christian-rauch merged commit e2fd02f into AprilRobotics:master Jun 16, 2023

This was referenced Aug 4, 2023

Update upstream #287

Closed

sync upstream logivations/apriltag#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up cluster merge by batch copying arrays #280

Speed up cluster merge by batch copying arrays #280

bouk commented Jun 15, 2023

christian-rauch left a comment

christian-rauch Jun 15, 2023

christian-rauch Jun 15, 2023

bouk commented Jun 16, 2023

Speed up cluster merge by batch copying arrays #280

Speed up cluster merge by batch copying arrays #280

Conversation

bouk commented Jun 15, 2023

Before

After

christian-rauch left a comment

Choose a reason for hiding this comment

christian-rauch Jun 15, 2023

Choose a reason for hiding this comment

christian-rauch Jun 15, 2023

Choose a reason for hiding this comment

bouk commented Jun 16, 2023